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(57) Abstract 

The present invention provides a method of arbitrary se- 
quence oligonucleotide fingerprinting (ASOF), a technology which 
eliminates gel electrophoresis as a step in polymorphic marker 
analysis, species identification and transcriptional profiling. ASOF 
greatly increases the speed and throughput of analysis with a con* 
com i tan t decrease in cost. Furthermore, the miniaturization and au- 
tomation of ASOF analysis leads to exceedingly increased through- 
put of nucleic acid analysis. 
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ARBITRARY SEQUENCE OLIGONUCLEOTIDE FINGERPRINTING 

5 

BACKGROUND OF THE INVENTION 

Field of the Invention 
10 The present invention relates generally to the fierds of 

molecular biology and nucleic acid analysis. More specifically, the 
present invention relates to a novel method of genetic analysis using 
arbitrary sequence oligonucleotide fingerprinting. 

15 Description of the Related Art 

A certain amount of DNA sequence variation occurs 
naturally within a population of individuals. At many chromosomal 
positions, the frequency of sequence variation within a population is 
great enough to yield useful DNA markers, and the occurrence of a 

20 polymorphic allele at a frequency of about 10% is generally 
considered useful for mapping purposes (1). Analysis of DNA 
polymorphisms has been extremely valuable for identifying genetic 
markers tightly linked to genes associated with phenotypic traits. 
The use of gel electrophoresis to detect restriction fragment length 

25 polymorphism (RFLP) has yielded thousands of mapped polymorphic 
DNA markers in various species. The most frequent type of genetic 
change associated with an RFLP marker is point mutation within the 
recognition sequence of a restriction enzyme. 
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Although restriction fragment length polymorphism 
analysis remains a widely used method for detecting DNA sequence 
polymorphism, several useful variations on the fragment length 
theme have recently been introduced. The existence of variable 
5 number tandem repeat (VNTR) or "microsatellite" sequences 
scattered throughout genomic DNA has been exploited in the 
identification of polymorphic markers (2,3). Micro-satellite probes 
have been used to detect polymorphisms in length of restriction 
fragments (2) and PGR products (3). Variable length "short tandem 
10 repeats" (STRs) such as (CA) n are highly polymorphic and serve as 

informative markers (4,5). 

Another recent advance in polymorphic marker analysis 
is single short primer PCR, or "random amplified polymorphic DNA" 
(RAPD) marker analysis. Conduct of PCR with genomic DNA using 

15 single short (8-10mer) primers of arbitrary sequence generates a 
product that can be used in gel electrophoretic fingerprint analysis to 
generate numerous polymorphic markers (6,7). Although variable 
number tandem repeat markers, short tandem repeats and RAPD 
markers have significantly increased the rate of polymorphic marker 

20 discovery and the throughput of polymorphic marker ana ysis, their 
analysis is limited by the requirement of labor intensive . gel 
electrophoresis, which typically requires several hours of time and 
accommodates a relatively small number of tests at one time (less 
than 100). 

25 Microbial identification is another analytical task that 

benefits from the present invention. Identification of bacterial, viral 
and mycotic species, strains and subtypes is a key concern in clinical 
microbiology, for diagnosis of infectious disease, selection of effective 
pharmaceutical treatment, and epidemiological investigation of the 
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source and spreading of infectious disease. Microbial identification is 
also a vital capability in the detection and management of biological 
warfare agents. Microbial identification is also important in , 
agricultural, industrial and environmental biomonitoring, for example 
5 in the detection of pathogens that reduce agricultural productivity as 
well as microbes that put nutrients into the soil, in the monitoring of 
industrial bioprocesses, and in the assessment of biodegradation 
capacity in soil and waste treatment facilities. Microbial identification 
typically involves time consuming and expensive culturing and 

10 biochemical procedures, as well as costly and complex immunological 
tests. DNA sequencing and PCR analysis can also be performed to 
achieve accurate microbial identification and typing, but like current 
DNA typing procedures, these microbial DNA diagnostic tests require 
gel electrophoretic analysis, which is time consuming and labor 

15 intensive and accommodates a relatively low sample throughput. 
Analysis of microbial populations, important in environmental and 
industrial settings, is currently a daunting task, typically requiring 
extensive culturing and a battery of biochemical tests, supplemented 
by crude classification by visual inspection. Many of the microbial 

20 species in environmental samples are not readily culturable, making 
detection and identification extremely difficult. 

Analysis of gene expression is another area that benefits 
from the present invention. Transcriptional profiling, i.e., analysis of 
the relative abundance of messenger RNA transcribed from different 

25 genes, is critical to the understanding of patterns of gene expression 
that are associated with all biological processes, including 
development, differentiation, response to environmental stresses, 
and other cellular and organismal functions of interest to basic 
scientists. The ability to analyze patterns of gene expression can lead 
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to discovery of new genes associated with biological processes. A 
detailed understanding of gene regulation at the transcriptional level 
is also a premier concern of the pharmaceutical industry, enabling 
identification of genetic targets for drug development and leading to 
the understanding of the well known heterogenity in the way 
different individuals respond to pharmaceutical interventions. 
Transcriptional profiling is currently conducted by the techniques of 
"differential display" (Liang, P and Pardee, A.B. (1992) Science 
257:967-971; Liang, et al., (1994) Nucl. Acids Res. 22:5763-5764; 
Prashar, Y. and Weissman, S.M. (1996) Proc. Nat'l. Acad. Sci., U.S.A. 
93:659-663.) and "representational difference analysis" (Hubank, M. 
and Schatz, D.G. (1994) Nucl. Acids Res. 22:5640-5648; Lisitsyn, N.A. 
(1995) Trends Genet. 11:303-307), both of which involve PCR, gel 
electrophoretic analysis of DNA fragments, and a variety of other 
complex manipulations. A need clearly exists for new technology that 
enables more robust, rapid and cost effective quantitation of a very 
large number of gene transcripts. 

The prior art is deficient in the lack of effective means for 
the rapid, simultaneous analysis of a large number of DNA markers, 
for rapid identification of species, strains, and sub-types and gender, 
and for rapid transcriptional profiling. The present invention fulfills 
this longstanding need and desire in the art. 

SUMMARY OF THE INVENTION 

The arbitrary sequence oligonucleotide fingerprinting 
technique of the present invention replaces gel electrophoresis with 
hybridization to a miniature array of numerous oligonucleotide 
probes, and enables simultaneous analysis of hundreds or thousands 
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of DNA markers. DNA sequence polymorphisms (DNA markers) are 
important tools in genetic analysis, serving as genetic markers in 
agricultural breeding programs, facilitating the discovery of genes 
associated with genetic diseases or other traits, documenting the 
5 identity of individual humans, animals or plants, and indicating the 
extent of genetic diversity among populations. The present invention 
discloses an improved procedure for DNA marker analysis and other 
forms of nucleic acid sequence analysis, termed "arbitrary sequence 
oligonucleotide fingerprinting" (ASOF), which enables the rapid, 

10 simultaneous analysis of a large number of DNA markers. The 
expected high information content of arbitrary sequence 
oligonucleotide fingerprinting analysis facilitates many kinds of 
genetic analyses. 

This invention provides an improved process for 

15 comparing nucleic acids extracted from different biological samples. 
One application of the invention is in the field of DNA marker 
analysis, wherein the identity of individuals is assessed through 
"DNA typing" (e.g., in forensic "identification"), and genes associated 
with specific phenotypic traits are identified and mapped to specific 

20 sites on the chromosomes. In the process of arbitrary sequence 
oligonucleotide fingerprinting, variations in the DNA sequence of 
different individuals of a species ("DNA sequence polymorphisms") 
are revealed by differences in the quantitative pattern of binding of 
DNA fragments prepared from different individuals to an array of a 

25 few hundred to a few thousand surface-tethered oligonucleotide 
probes of arbitrary nucleotide sequence. 

The arbitrary sequence oligonucleotide fingerprinting 
technique of the present invention has important commercial 
application in several fields. For example, applications of the various 
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embodiments of the technique of the present invention include DNA 
fingerprinting for individual identification - applied in forensic and 
paternity testing, DNA typing of prison and military populations, 
gender determination in plants and animals, genotyping of horses, 
5 cattle, poultry, wildlife species, proprietary plant cultivars, 
genetically engineered agricultural varieties, and eventually, 
household pets and every newborn child. Secondly, the techniques of 
the present invention allow simultaneous analysis of large numbers 
of DNA markers for, e..g, tracking down genes associated with genetic 
10 diseases or genes conferring susceptibility or resistance to infectious 
or genetic diseases or environmental stress, and for discovery of 
genes associated with desirable traits in plants and animals leading 
to commercial opportunities in medicine and agriculture. Thirdly, the 
techniques of the present invention allow profiling of gene 
15 expression (whereby hybridization pattern reflects relative 
abundance of different mRNA species), for example, to identify and 
isolate genes associated with biological responses of interest to the 
pharmaceutical industry. Fourthly, the techniques of the present 
invention allow assessment of genetic and/or biological diversity, e.g. 
20 addressing environmental concerns, and supporting the 
establishment of resources for discovery of new biotechnology 
products. Fifthly, the techniques of the present invention allow 
analysis of microbial population dynamics which is relevant to waste 
treatment, bioremediation and microbial and chemical process 
25 control. Finally, the techniques of the present invention allow 
microbial identification for infectious disease diagnostics and 
ecosystem surveillance. 

A number of problems are solved by the present 
invention. Anticipated advantages of the genosensor-based arbitrary 
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sequence oligonucleotide fingerprinting procedure over current gel 
electrophoresis-based DNA typing methods include: (1) greater speed 
of analysis; (2) higher throughput of analysis (ability to process 
larger numbers of samples per work day); (3) lower cost per analysis; 
(4) greater statistical reliability due to much higher information 
content; and (5) in some enbodiments, direct analysis of complex 
nucleic acid sequences without the use of DNA amplification. A key 
feature, common to all embodiments of the arbitrary sequence 
oligonucleotide fingerprinting technique of the present invention, is 
the use of a set of arbitrary sequence oligonucleotide probes, each 
sequence located at a specific site on a hybridization support via 
binding of the short strands to the surface at one end. 

Another significant embodiment of the present invention 
is in the use of arbitrary sequence oligonucleotide arrays for gene 
expression profiling, which constitutes a strategy of "differential 
display on a chip." Bulk messenger RNA is extracted from cells, 
subjected to reverse transcription to v form cDNA. PCR is then 
performed to generate subsets of expressed sequences, as in the 
prior art of differential display, and instead of displaying the PCR 
fragments by gel electrophoresis, in the present invention the PCR 
mixture is hybridized with an array of arbitrary sequence 
oligonucleotides to generate a hybridization fingerprint which 
quantitatively reflects the relative abundance of different mRNA 
species. The length of oligonucleotide probes arrayed across the 
genosensor chip can be adjusted to accommodate variations in total 
sequence complexity of the PCR fragments, as is done in the 
application of ASOF in polymorphic marker analysis, so that on 
average, each transcript hybridizes to one or a few sites across the 
array. Changes in gene expression will result in changes in the 
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hybridization si-gnal intensity at different positions across the 
genosensor array, and target sequences bound to the relevant sites 
can be released (melted off by hot water) for further analysis, 
including cloning and sequencing- In a preferred embodiment of the 
5 present invention for gene expression profiling, the array of 
arbitrary sequence oligonucleotide probes is formed in the 
"flowthrough genosensor" (Beattie, K.L. (1994) Microfabricated, 
Flowthrough Porous Apparatus for Discrete Detection of Binding 
Reactions, patent application PCT/US94/12282, filed Oct. 27, 1994; 

10 Beattie, et al., (1995) Clin. Chem. 41:700-706.), in which probes are 
immobilized within hybridization cells containing densely arrayed 
smooth channels or pores of 1-10 micron diameter, extending across 
a silicon or glass wafer typically 500 microns thick. Dilute nucleic 
acid solutions can be analyzed by flowing them through ihe porous 

15 glass hybridization array, and the quantity of bound material per 
unit cross section is on the order of 100 times that of the flat surface 
genosensor array, which greatly increases the sensitivity and 
dynamic range of the analysis. Also advantageous for transcriptional 
profiling using the present invention, the flowthrough genosensor 

20 configuration facilitates recovery of hybridized strands tor further 
analysis. The present invention can also be advantageously applied 
to the profiling of genomes and expressed genes from mixed 
populations of organisms, for example, microbial populations in soil 
samples and waste treatment facilities. By using arbitrary sequence 

25 probes of length appropriate for the total genetic complexity of the 
sample, a specific hybridization fingerprint may be produced from 
the environmental sample which reflects the microbial population, 
and a change in the microbial population can be seen as i\ change in 
the hybridization fingerprint. 
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Additional embodiments of the present invention are 
disclosed which enable direct hybridization fingerprinting of highly 
complex nucleic acid mixtures, without the necessity of preparing a 
subset of sequences by PCR. The total nucleic acid sample (genomic 
DNA or RNA of total genetic complexity millions or billions of bases) 
is extracted from cells or mixed populations, labeled, and hybridized 
to arrays of longer oligonucleotide probes (of length 12-18 bases) to 
generate a complex fingerprint reflecting representative sequences 
from the entire nucleic acid sample. Obviously, much longer 
hybridization times are required for the direct fingerprinting 
embodiment of arbitrary sequence oligonucleotide fingerprinting 
than in the embodiments that include preparation of a subset of 
sequences by PCR. The flowthrough genosensor configuration, which 
enables analysis of dilute nucleic acid samples flowed through the 
porous array, is therefore a preferred hybridization substrate for the 
direct fingerprinting of nucleic acid samples of high genetic 
complexity. The direct fingerprinting embodiment of the present 
invention is a particularly preferred strategy for analysis of 
microbial genomes and messenger RNA populations, where the total 
genetic complexity is typically on the order of millions of bases. 

For direct analysis of nucleic acid samples of total genetic 
complexity in the billion base range, such as genomic DNA of higher 
eukaryotes or bulk messenger RNA extracted from complex mixtures 
of microorganisms, the following embodiment of the current 
invention is preferred. Arrays of up to several thousand arbitrary 
sequence "capture probes" of length 7-9 bases are prepared, 
preferably in a flowthrough (porous glass) hybridization support. The 
complex nucleic acid sample is then mixed with one or more labeled 
oligonucleotides (also of arbitrary sequence and length 7-9 bases) 
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and hybridized to the array of capture probes. The hybridization is 
carried out at an ionic strength and temperature at wiiich short 
duplex regions (7-9 base pairs) are unstable but longer duplex 
regions (14-18 base pairs) are stable. Under these conditions, it is 
5 known that the nucleic acid strands will stably hybridize to the 
oligonucleotide array only when the capture probe and trie labeled 
probe hybridize to a target strand in a tandem fashion, that is, form a 
continuous stretch of base-stacked duplex of combined length 
(Khrapko, et al., (1991) DNA Sequence 1:375-388.), 14-18 base pairs 

10 in this embodiment of the present invention. In this way, the total 
effective length of the probe is long enough to produce a meaningful 
hybridization fingerprint of the entire complex nucleic acid. If 
shorter probes were used, essentially all of the probes would 
hybridize at multiple sites within the highly complex nucleic acid 

15 target, and a totally occupied, meaningless fingerprint would be 
produced. 

The key requirement that is fulfilled by adjusting the 
effective probe length to "match" the total genetic complexity of the 
target is to produce hybridization fingerprints in which only a 

20 fraction (typically 1/4 to 2/3) of the hybridization sites are occupied 
by hybridized strands, so that on average, only one target sequence 
is bound within each hybridization cell. In the tandem probe 
embodiment of the present invention, the frequency of occurrence of 
contiguously stacked capture/labeled probes hybridized to the target 

25 strand can be conveniently adjusted (to produce a meaningful 
fingerprint) by varying the number of labeled oligonucleotide probe 
sequences that are included in the hybridization mixture. 

The information content of the hybridization fingerprint 
can be greatly enhanced by using mixtures of labeled probes bearing 



BNSOOCID: <WO 9722720A1 J_> 



WO 97/22720 PCT/US96/20628 

11 

a variety of distinguishable fluorophores, to simultaneously create a 
multiplicity of distinct fingerprints in the same hybridization 
reaction. Another useful feature of the tandem probe embodiment 
for fingerprinting of complex nucleic acids is that the combination of 
5 capture and labeled probes, hybridizing in tandem with the target 
strand, immediately defines a sequence of 14-18 bases, which can be 
used to create a primer for further analysis of bound strands by 
dideoxy sequencing or PCR. 

In addition to using single short (9-10mer) arbitrary 

10 sequence primers for amplification of a specific subset of the genome 
(prior to hybridization to the array of arbitrary sequence probes), the 
present invention also discloses the use of mixtures of longer PCR 
primers (e.g., 100 13mers of arbitrary sequence), at higher 
temperature of annealing, to obtain a more reproducible amplified 

15 genomic subset. 

In addition to using arbitrary sequence PCR, the present 
invention also discloses the use of mixtures of longer PCR primers 
directed to known regions spaced across the genome (e.g., multiple 
pairs of 20-30mers) to amplify specific, known genomic regions. The 

20 products would then be hybridized to arrays of arbitrary sequence 
probes, to obtain fingerprints that reveal sequence polymorphisms 
within the known regions. Regardless of the method chosen to 
prepare the genomic fragments, the hybridization fingerprint will be 
specific and quantitative, such that even a two-fold change in 

25 relative hybridization signal, such as that associated with 
homozygous vs. heterozygous condition, can be distinguished. An 
important aspect of the invention is the stepwise process whereby 
the combination of PCR with arbitrary array hybridization is first 
used to discover new sequence polymorphisms, then the specific 
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combinations of primers and probes that test the new 
polymorphisms are implemented in a directed fashion to 
simultaneously analyze hundreds to thousands of sequence 
polymorphisms. Another important point not taught by the prior art 
5 is that the reproducible hybridization pattern seen for a given set of 
PCR fragments and probes is not entirely due to perfectly base paired 
duplex regions (i.e., Watson-Crick pairing between oligonucleotide 
probes and target strands). Many of the hybridization signals will 
involve imperfectly paired duplexes, containing one or more base 
10 mismatches or even regions of tertiary structure. The existence of 
imperfect duplexes is also influenced by sequence polymorphism, 
and as long as the hybridization patterns are reproducible, it does not 
matter whether they represent perfect matches. The well known 
patent issued to Dr. Southern, for example, specifically refers to 
15 perfect hybrids. 

In a seventh embodiment of the present invention, there 
is provided a method for direct genomic fingerprinting of DNA 
samples of high genetic complexity, comprising the steps of: 
extracting genomic DNA from a biological sample; adding at least one 
20 labeled oligonucleotide probe of arbitrary sequence to the extracted 
DNA and hybridizing the mixture with an array of arbitrary sequence 
capture probes, using conditions of temperature and ionic strength 
under which neither the labeled probe(s), nor capture probes alone 
will stably hybridize with the DNA target, but under which capture 
25 and labeled probes, when tandemly hybridized to a target strand to 
form a longer, contiguously base-stacked combined duplex region, 
will result in stable capture of the target strand; and comparing the 
hybridization fingerprint with genomic fingerprints obtained from 
different biological samples. 
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In an eighth embodiment of the present invention, there 
is provided a method for direct transcriptional profiling of nucleic 
acid samples of high genetic complexity, comprising the steps of: 
extracting messenger RNA from a biological sample; adding at least 
5 one labeled oligonucleotide probe of arbitrary sequence to the 
extracted RNA and hybridizing the mixture with an array of 
arbitrary sequence capture probes, using conditions of temperature 
and ionic strength under which neither the labeled probe(s), nor 
capture probes alone will stably hybridize with the RNA target, but 

10 under which capture and labeled probes, when tandemly hybridized 
to a target strand to form a longer, contiguously base-stacked 
combined duplex region, will result in stable capture of the RNA 
transcript; and comparing the hybridization fingerprint with RNA 
fingerprints obtained from different biological samples. 

15 In a ninth embodiment of the present invention, there is 

provided a method for direct fingerprint analysis of nucleic acid 
samples of high genetic complexity, comprising the steps of: 
extracting DNA or RNA from a biological sample; adding at least one 
labeled oligonucleotide probe of arbitrary sequence to the extracted 

20 nucleic acid and hybridizing the mixture with an array of arbitrary 
sequence capture probes, using conditions of temperature and ionic 
strength under which neither the labeled probe(s), nor capture 
probes alone will stably hybridize with the target strands, but under 
which capture and labeled probes, when tandemly hybridized to a 

25 target strand to form a longer, contiguously base-stacked combined 
duplex region, will result in stable capture of the target strand; 
comparing the hybridization fingerprint with fingerprints obtained 
from different biological samples; eluting bound target strands from 
any desired hybridization cell in the array, preferably by applying 
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hot water to the desired location in the array; and further analyzing 
the eluted strands by methods selected from the group consisting of 
cloning, PCR or dideoxy sequencing, using (if desired) the combined 
sequence of the capture and labeled probes to define a longer primer 
5 for amplification or dideoxy sequencing. 

Thus, in accordance with the above-described advantages 
and desirable features of the invention, in one embodiment of the 
present invention, there is provided a method of detecting 
polymorphisms between samples of genomic DNA, comprising the 
10 steps of: amplifying a first subset of genomic DNA sequences by a 
polymerase chain reaction using one or more oligonucleotide primers 
of arbitrary sequence; labeling said first amplified subset of genomic 
DNA; combining said first amplified subset of genomic DNA with a 
two-dimensional array of surface-bound oligonucleotide probes 
15 under hybridizing conditions to form a first quantitative 
hybridization fingerprint for said first subset of genomic DNA 
sequences; amplifying a second subset of genomic DNA sequences by 
a polymerase chain reaction using said one or more oligonucleotide 
primers of arbitrary sequence; labeling said second amplified subset 
20 of genomic DNA; combining said second amplified subset of genomic 
DNA with said two-dimensional array of surface-bound 
oligonucleotide probes under hybridizing conditions to form a second 
quantitative hybridization fingerprint for said subset of genomic DNA 
sequences; comparing said first quantitative hybridization 
25 fingerprint to said second quantitative hybridization fingerprint; and 
detecting polymorphisms in said samples of genomic DNA b y 
detecting differences between said first quantitative hybridization 
fingerprint and said first quantitative hybridization fingerprint. 
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In an additional embodiment of the present invention, 
there is provided a method of detecting polymorphisms in a genomic 
DNA sample, comprising the steps of: amplifying a first subset of 
genomic DNA sequences by a polymerase chain reaction using a 
5 multiplicity of defined sequence oligonucleotide primer pairs 
directed toward a corresponding multiplicity of known genomic 
regions; labeling said first amplified subset of genomic DNA; 
combining said first amplified subset of genomic DNA with a two- 
dimensional array of surface-bound oligonucleotide probes under 

10 hybridizing conditions to form a first quantitative hybridization 
fingerprint for said first subset of genomic DNA sequences; 
amplifying a second subset of genomic DNA sequences by a 
polymerase chain reaction using said multiplicity of defined sequence 
oligonucleotide primer pairs directed toward a corresponding 

15 multiplicity of known genomic regions; labeling said second amplified 
subset of genomic DNA; combining said second amplified subset of 
genomic DNA with said two-dimensional array of surface-bound 
oligonucleotide probes under hybridizing conditions to form a second 
quantitative hybridization fingerprint for said subset of genomic DNA 

20 sequences; comparing said first quantitative hybridization fingerprint 
to said second quantitative hybridization fingerprint; and detecting 
polymorphisms in said samples of genomic DNA by detecting 
differences between said first quantitative hybridization fingerprint 
and said first quantitative hybridization fingerprint. 

25 In yet another embodiment of the present invention, 

there is provided a method for profiling of gene expression at the 
level of transcription, comprising the steps of: extracting RNA from a 
biological sample; conducting reverse transcriptase-arbitrary primer 
PCR to amplify subsets of expressed sequences; labeling said 
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amplified subsets of expressed sequences; hybridizing the labeled, 
amplified subsets of expressed sequences with an array of 
oligonucleotide probes of arbitrary sequence to produce a 
quantitative hybridization fingerprint; and detecting differences in 
gene expression from comparing said quantitative hybridization 
fingerprint with quantitative hybridization fingerprints obtained 
from a other experiments performed previously for other biological 
samples. 

In another aspect of the present invention, there is 
provided an improved method of preparing oligonucleotide arrays 
for use in hybridization analyses, comprising the steps of; chemically 
synthesizing a desired set of oligonucleotide probes using 3"-amino- 
C3 controlled pore glass support material to produce completed 
desired oligonucleotides; cleaving said completed desired 
oligonucleotides from said support material in concentrated 
ammonium hydroxide to yield oligonucleotides bearing 
aminopropanol groups at their 3'-termini; cleaning a glass or silicon 
dioxide surface with organic solvents and drying at elevated 
temperature; applying a quantity of oligonucleotides bearing 
aminopropanol groups at their 3'-termini in aqueous solution to said 
surface of said clean, dry glass or silicon dioxide; allowing covalent 
bonding of said oligonucleotides bearing aminopropanol groups at 
their 3'-termini to said surface through terminal aminopropanol 
functions; and removing unbound oligonucleotides from the surface 
by washing with water. 

Other and further aspects, features, and advantages of the 
present invention will be apparent from the following description of 
the presently preferred embodiments of the invention given for the 
purpose of disclosure. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



So that the matter in which the above-recited features, 
advantages and objects of the invention, as well as others which will 
become clear, are attained and can be understood in detail, more 
particular descriptions of the invention briefly summarized above 
may be had by reference to certain embodiments thereof which are 
illustrated in the appended drawings. These drawings form a part of 
the specification. It is to be noted, however, that the appended 
drawings illustrate preferred embodiments of the invention and 
therefore are not to be considered limiting in their scope. 

Figure 1 illustrates graphically the method of arbitrary 
sequence oligonucleotide fingerprinting. 

Figure 2 shows the gel electrophoretic display of DNA 
fragments produced by single primer PCR. 

Figure 3 shows the formation of 3 -aminopropanol- 
derived oligonucleotides. 

Figure 4 depicts a scheme for covalent linkage of V 
aminopropanol oligonucleotides to plain glass surfaces. 

Figure 5 shows the hybridization patterns obtained in the 
conduct of arbitrary sequence oligonucleotide fingerprinting using 
DNA extracted from three different individuals. 

Figure 6 illustrates the direct fingerprinting of complex 
nucleic acid samples using a tandem hybridization strategy. 

DETAILED DESCRIPTION OF THE INVENTION 

To understand the workings of the present invention, it is 
important to compare the expected throughput of arbitrary sequence 
oligonucleotide fingerprinting analysis with that of current DNA 



WO 97/2272^ PCT/US96/20628 

18 

marker analysis techniques (RFLP, STRP and RAPD) For this 
comparison one can assume that a typical laboratory will conduct 
DNA marker analysis (by either genosensor-based or gel-based 
methods) on 200 samples per day. One can assume further that 2 00 
5 arbitrary sequence oligonucleotide fingerprinting analyses, involving 
hybridization of arbitrary PCR products to an array of 200 miniature 
genosensor chips, can be achieved in equivalent time and space as a 
single analysis using 200 electrophoretic lanes. Additional 
assumptions include: when new polymorphic markers are being 

10 searched for, twenty tests are carried out per day with ten different 
individuals (200 samples total). In discovery of new RFLP markers, 
one assumes that each lane contains an average of 20 bands; the 
restriction site is 5 bases (average of 4-base and 6-base cutters). For 
STR and RAPD marker analysis one can assume that 50% cf all lanes 

15 will reveal a polymorphism. For analysis of a known polymorphism, 
one should assume that each lane will test a single polymorphism in 
gel-based methods and that each genosensor containing a 5 0x50 
array of probes will test an average of 1,000 polymorphic sites. 
Based on the above, throughput estimations are made for two cases: 

20 (i) discovery of new polymorphic markers; and (ii) subsequent 
analysis of known polymorphic markers. The following is predicted: 
# new polymorphisms discovered pgr day 

R FIP SIR RAPD ASQ F 
1 0 1 0 10 500 

25 # known polymorphisms analyzed per day 

RFLP gTRP RAPP ASQF 

200 200 200 200,000 

The above predictions suggest a fifty-fold increase in 
throughput for genosensor-based arbitrary sequence oligonucleotide 
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fingerprinting marker analysis of the present invention compared 
with standard gel-based analyses, during the identification of new 
polymorphic markers. The increase in throughput for analysis of 
known polymorphisms is even more dramatic (1000-fold increase for 
5 arbitrary sequence oligonucleotide fingerprinting analysis compared 
with gel-based techniques). The present invention combines the 
throughput advantages for both DNA marker discovery and DNA 
marker screening (genotyping), both of which are important in 
genome analysis. 

10 The present invention provides a method of detecting 

DNA sequence polymorphisms in a sample, comprising the steps of: 
amplifying a sample of genomic DNA using the polymerase chain * 
reaction (PCR); labeling the amplified genomic subset; hybridizing the 
amplified genomic subset with a two-dimensional array of surface 

15 bound oligonucleotide probes of arbitrary sequence; and; detecting 
polymorphisms in the sample of genomic DNA by detecting changes 
in the quantitative hybridization fingerprint within the DNA probe 
array. Generally, the method of amplifying the genomic DNA in the 
technique of the present invention is selected from the group 

20 consisting of (1) PCR using individual short oligonucleotides (8mer- 
12mer) of arbitrary sequence; (2) PCR using mixtures of longer 
oligonucleotides, for example 100 13mer-15mer of arbitrary 
sequence; (3) PCR using at least one pair of specific primers targeted 
to at least one genomic region known to display a high degree of 

25 sequence polymorphism; and (4) PCR using a multiplicity of primer 
pairs targeted to specific genomic regions, for example. 20-100 
sequence tagged sites (STSs). Generally, a person having ordinary 
skill in this art can detect polymorphisms in the sample of genomic 
DNA by detecting changes in the quantitative hybridization 
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fingerprint within the DNA probe array (i.e./ changes in the relative 
quantity of label at different sites) using such techniques as 
phosphorimager analysis, autoradiography and CCD camera image 
analysis. 

The present invention also provides a method of 
oligonucleotide array fingerprinting for classification or identification 
of species in a biological sample, comprising the steps of: extracting 
DNA from the biological sample; conducting the polymerase chain 
reaction to prepare a set of DNA fragments corresponding to a subset 
of genomic sequences; labeling the amplified genomic subset; 
hybridizing the labeled fragments to arrays of oligonucleotides of 
arbitrary sequences; and making species classification or 
identification by comparing the hybridization fingerprint across the 
DNA probe array with a database of specific hybridization 
fingerprints previously determined to correspond to known species. 

The present invention further provides a method of 
analyzing mixed populations of organisms in environmental samples 
by oligonucleotide array fingerprinting, comprising the steps of: 
extracting DNA from a sample of soil, water, or industrial process 
stream; conducting PCR to prepare DNA fragments corresponding to a 
subset of genomic sequences in the environmental sample; labeling 
the amplified fragments; hybridizing the labeled fragments to arrays 
of oligonucleotide probes of arbitrary sequences; and detecting 
differences in cellular populations between environmental samples, 
reflected by differences in the quantitative hybridization fingerprints 
across the oligonucleotide arrays. 

The present invention in addition provides a method of 
analyzing patterns of gene expression, comprising the steps of: 
extracting RNA from a cellular sample; preparing DNA fragments 



WO 97/22720 PCT/US96/20628 

21 

representing expressed genes; labeling the DNA fragments; 
hybridizing the labeled fragments to arrays of oligonucleotide probes 
of arbitrary sequence; and detecting changes in gene expression from 
changes in the relative hybridization intensity at different positions 
5 across the DNA probe array. Generally, the method of preparing DNA 
fragments representing expressed genes is selected from the group 
consisting of reverse transcriptase polymerase chain reaction (RT- 
PCR) to prepare cDNA, PCR strategies to prepare subfractions of 
expressed sequences, as used in gel electrophoresis-based 

10 differential display analysis, and steps of PCR, restriction 
fragmentation, subtractive hybridization and gel electrophoresis, as 
used in representational difference analysis (RDA). 

The present invention also provides a method of^direct 
fingerprinting of complex genomes without DNA amplification, 

15 comprising the steps of: mixing genomic DNA extracted from a 
biological sample with at least one labeled oligonucleotide probe of 
arbitrary sequence and hybridizing the mixture with an array of 
arbitrary sequence capture probes, using conditions of temperature 
and ionic strength under which neither the labeled probe(s), nor 

20 capture probes alone will stably hybridize with the DNA target, bui 
under which capture and labeled probes, when tandemly hybridized 
to a target strand to form a longer, contiguously base-stacked 
combined duplex region, will result in stable capture of the target 
strand; and comparing the hybridization fingerprint with genomic 

25 fingerprints obtained from different biological samples. 

The present invention further provides a method of 
direct transcriptional profiling in a biological sample, comprising the 
steps of: mixing bulk messenger RNA extracted from the biological 
sample with at least one labeled oligonucleotide probe of arbitrary 
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sequence and hybridizing the mixture with an array of arbitrary 
sequence capture probes, using conditions of temperature and ionic 
strength under which neither the labeled probe(s), nor capture 
probes alone will stably hybridize with the RNA target, but under 
5 which capture and labeled probes, when tandemly hybridized to a 
target strand to form a longer, contiguously base-stacked combined 
duplex region, will result in stable capture of the RNA transcript; and 
comparing the hybridization fingerprint with RNA fingerprints 
obtained from different biological samples. 

10 The arbitrary sequence capture probes and labeled 

probes used in the tandem hybridization embodiment of direct 
nucleic acid fingerprinting are preferably of length 7-9 bases. 
Libraries of capture probes and labeled probes for nucleic acid 
fingerprinting can be conveniently maintained, to provide a universal 

1 5 resource for fingerprinting of any nucleic acid sample. Hybridization 
fingerprints of known genomes or associated with known 
physiological conditions can be archived in a database and queried 
for identity and similarity with newly acquired fingerprints. 
The oligonucleotide probes that are mixed with the nucleic acid 

20 sample in the tandem hybridization embodiments of arbitrary 
sequence oligonucleotide fingerprinting described above; can be 
labeled with a variety of tags, selected from the group consisting of: 
radioactive labels (32P, 33P, 35S), which can be introduced onto the 
5'-end of synthetic oligonucleotides using polynucleotide kinase; 

25 fluorescent tags, which can be introduced into the probes during 
chemical synthesis of oligonucleotides (using fluorescent 
phosphoramidites), or chemically coupled with primary amine- 
derivatized oligonucleotides; and biotin, which can also be introduced 
into the probes during chemical synthesis of oligonucleotides. The 
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simultaneous use of a multiplicity of fluorescent labels can greatly 
increase the information content of the hybridization fingerprint. The 
use of biotinylated probes has the advantage of enabling enzymatic 
signal amplification to produce fluorescent, chemiluminescent or 
5 colored products, through use of a variety of commercially available 
enzyme-conjugated streptavidin and substrates signal-generating 
substrates. 

In the direct fingerprinting (tandem hybridization) 
embodiment of the present invention, designed for analysis of nucleic 

10 acid samples of high genetic complexity, the preferred hybridization 
substrate is channel glass or porous silicon (flowthrough genosensor), 
in which probes are immobilized within patches of densely arrayed 
channels of 1-10 micron diameter extending across a glass or silicon 
dioxide layer of typically 500 microns thick. The flowthrough 

15 genosensor has the following important advantages over the flat 
surface genosensor configuration, which enable the direct 
fingerprinting embodiments of the present invention: improved 
hybridization kinetics, detection sensitivity and dynamic range, due 
to greatly increased surface area per unit cross section; greatly 

20 improved hybridization of dilute nucleic acid solutions, which can be 
slowly flowed through the porous hybridization array; and ability to 
simultaneously analyze both strands of duplex DNA fragments 
(simply by heat-denaturing a dilute DNA sample prior to passing it 
through the flowthrough genosensor), without having to physically 

25 isolate the two strands prior to hybridization, as is typically required 
for hybridization on a flat surface. 

In all embodiments of arbitrary sequence oligonucleotide 
fingerprinting of the present invention, hybridization is generally 
carried out as follows. Oligonucleotide arrays on glass are 
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"prehybridized" by soaking for 1 ?ff at room temp, in a "blocking 
solution" followed by a brief water wash. A solution of 10-20 mM 
tripolyphosphate is an effective and economical blocking solution for 
minimizing the nonspecific binding of 32 P-labeled target strands to 
5 glass slides. Target DNA (typically, PCR product) is dissolved in (or 
added to) hybridization buffer (either 6XSSC or 3.3M 
tetramethylammonium chloride in 50mM Tris-HCl (pH 8), 2m M 
EDTA, 0.1% SDS and 10% polyethylene glycol-8000) at a concentration 
of 10-50 fmol strands per microliter (10-50 nM). If the target 

10 strands are labeled with 32 P, a minimum of 2,000 cpm per microliter 
is used in the hybridization mixture, and prior to addition of labeled 
DNA to the hybridization mixture, unincorporated label is removed 
by loading the DNA onto a Microcon-3 microconcentrator (Amicon, 
Beverly, MA) and washed three times with water. Furthermore, if 

15 PCR is used to amplify the target, the PCR product is processed with a 
Millipore (Bedford, MA) Ultrafree spin-filter (30,000 molecular 
weight cutoff) to remove excess PCR primers prior to hybridization. 
An aliquot of target DNA in hybridization buffer is pipetted onto the 
microscope slide (20 microliters for an array occupying 1/3 of the 

20 slide or 60 microliters for the entire slide) and covered with a glass 
cover slip. The slide is incubated at 6 deg C for 2 hr to overnight, 
then the slide is washed at room temperature for at least 1 hr with 
hybridization buffer without PEG. For hybridization of immobilized 
probes of different lengths, variations in the hybridization and 

25 temperature should be explored to optimize the hybridization with 
respect to signal intensity and mismatch discrimination. 
Hybridization of 12mer arrays can be conveniently carried out at 
room temperature in the above hybridization buffer. If target 
strands are labeled with 32 P, hybridization can generally be 
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quantitated within a few minutes using a phosphorimager, although 
overnight exposure against X-ray film is adequate for 
autoradiographic detection. 

The present invention in addition provides a method for 
preparation of oligonucleotide arrays for hybridization analvsis of 
nucleic acid samples, comprising the steps of: chemical synthesis of 
oligonucleotide probes using the standard phosphoramidite method 
with 3'-Amino-Modifier C3 CPG solid phase support (available from 
Glen Research, Sterling, VA), which generates the 3'-aminopropanol 
function upon cleavage of the oligonucleotide from the support; 
cleaning the glass surface to be used as hybridization support with at 
least one organic solvent (for example, acetone and ethanol), followed 
by drying at elevated temperature; dissolving the 3*-aminopropanol- 
derivatized oligonucleotides in water at a concentration of 10-20 
micromolar; applying a small droplet of each oligonucleotide solution 
onto the clean, dry glass surface, typically in a volume of 10-1000 
nanoliters, placed 0.5-2 millimeters apart on the surface; incubating 
at room temperature (typically 5-30 minutes), followed by washing 
with water, air drying and storing dessicated at room temperature. 

The simplified attachment method described above is 
more convenient, faster and more reliable than the previous 
epoxysilane-amine attachment method, and also gives a lower 
background of nonspecific binding of target strands to the glass 
surface. Both attachment methods yield a similar probe attachment 
density within each hybridization site (approx. 10 l0 to 1 0 M molecules 
per square millimeter of glass surface). Oligonucleotide probes 
solutions can be arrayed across the hybridization support manually, 
using a template below the glass surface to guide the positioning of 
each droplet, or alternatively, robotically, using an automated fluid 
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dispensing instrument such a Hamifton Microlab 220J3 system. The 
latter instrument is capable of reproducibly delivering droplets as 
small as ten nanoliters onto a glass surface, at 0.5-1 mm center-to- 
center spacing. In the preparation of extensive arrays of 
5 oligonucleotide using the simplified attachment procedure described 
above, covalent attachment occurs quickly, and even though the first 
droplets applied to the surface may dry before the last droplet is 
applied, the entire array may be held at room temperature until all 
droplets dry, then washed with water, yielding uniform attachment 
10 density across the array. 

The following examples are given for the purpose of 
illustrating various embodiments of the invention and are not meant 
to limit the present invention in any fashion. 

15 EXAMPLE 1 

Rationale pf AgQF 

In several embodiments of the arbitrary sequence 
oligonucleotide fingerprinting technique provided herein, genomic 
DNA was subjected first to the polymerase chain reaction using a 

20 single short primer of arbitrary sequence or a mixture of longer 
arbitrary sequence primers.. The amplified genomic subset was then 
labeled and hybridized with a two-dimensional array of a few 
hundred to a few thousand different surface-bound oligonucleotide 
probes. Polymorphisms that affect priming events during PCR or 

25 affect the binding of amplified target to surface-tethered probes are 
expected to induce changes in the hybridization fingerprint within 
the DNA probe array. The arbitrary sequence oligonucleotide 
fingerprinting method enables rapid identification of DNA markers 
followed by simultaneous screening of large numbers of markers. 
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The arbitrary sequence oligonucleotide fingerprinting technique 
speeds the identification of genes and alleles relevant to many 
disciplines, including pharmaceutical development, agricultural 
breeding programs, and forensics. 

The concept of arbitrary sequence oligonucleotide 
fingerprinting is based, at least in part, on the following rationale. 
First, mutations at polymorphic sites will disrupt base pairing with 
PCR primers annealing at these sites and will interfere with 
hybridization of probes targeted to the polymorphic sites. Secondly, 
if a procedure of genomic sampling (using PCR to select specific 
sequences from the total genomic pool) is carried out which depends 
on base pairing, the population of PCR-sampled genomic sequences 
may be perturbed by DNA sequence polymorphisms. Third, the 
sequence variations (polymorphisms) represented in the set of 
amplified fragments are expected to be revealed by differences in 
the hybridization fingerprints produced from DNA of different 
individuals. Fourth, after numerous arbitrary sequence 

oligonucleotide fingerprinting experiments are carried out to identify 
specific oligonucleotides (within the array of arbitrary sequence 
probes) that are capable of revealing sequence polymorphism (ASOF 
markers) for each set of PCR fragments (produced by a specific PCR 
condition); then specific combinations of PCR and arrayed probes can 
be used simultaneously to analyze numerous ASOF markers. Finally, 
since the arbitrary sequence oligonucleotide fingerprinting method 
enables simultaneous analysis of numerous sequence polymorphisms, 
a person having ordinary skill in this art is able to screen for 
numerous polymorphic markers very rapidly. 

The present invention discloses a two-step "sampling" 
procedure which is sensitive to sequence variation at either step 
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(priming or hybridization) and the procedure simultaneously can 
examine thousands of sampled sequences for polymorphism. The 
technology involves two steps: First, a PCR reaction is carried out to 
specifically amplify a subfraction of the genome. Then the amplified 
5 DNA product is hybridized to a grid (e.g., 50 x 50 array) of end- 
linked oligonucleotide probes (a DNA probe array, or "genosensor") to 
yield a hybridization pattern. The nucleotide sequence o::~ the PCR 
primers and support-bound probes are arbitrarily chosen (within 
certain selection rules) to insure wide "sampling" of genomic 

10 sequence polymorphisms and to enable uniform stability fo potential 
duplexes formed with probes of different sequence. 

The use of short (e.g., 8mer-10mer) PCR primers with 
genomic DNA of plants or animals typically yields 50-100 bands in a 
gel electrophoretic assay, in the size range of a few hundred to a few 

15 thousand base pairs. The set of sampled genomic sequences typically 
represents a total of 50,000-100,000 base pairs of genomic DNA. 
When such a mixture produced from several individuals is analyzed 
by gel electrophoresis, one is lucky to find a single polymorphism 
(RAPD marker) with any given primer, seen as the presence or 

20 absence or shifting of a gel band. The present invention enables 
sequence variation within the amplified (sampled) genomic 
sequences to be detected more readily by hybridization of the entire 
mixture of fragments to an array of a few hundred to a few thousand 
oligonucleotide probes, yielding a complex "fingerprint" that will vary 

25 at one or more sites— by loss of hybridization signals, creation of new 
hybridization signals, or changes in relative signal intensity-- 
compared with genomic DNA sampled from another individual). 
Thus, polymorphism within genomic targets of the arrayed DNA 
probes alters the hybridization "fingerprints." If mutations within 
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TABLE I 

Probe ave. no. of Qccurrepces ave. number of hvhridization 

length fp) within the amplified signals for a given size of apt* 

^rget 20x20 arrav50x50 array 

8mcr 1-53 (610) (3,815) 

9mer 0.38 152 954 

lOmer 0.10 38.1 238 

From the above TABLE I, it appears that an array of 9mer 
probes is a preferred embodiment of the arbitrary sequence 
oligonucleotide fingerprinting technique of the present invention. 
Further, the approximate number of ASOF markers that could be 
discovered in a single hybridization experiment (performed with 
DNA from ten individuals) using a 50 x 50 probe array must be 
15 considered. If an estimate of 0.005 for the average frequency of 
useful polymorphism per base pair (minor allele detectable in at 10% 
of individuals) is made, and it is assumed that 50% of single base 
changes is detectable at either the level of hybridization or priming, 
predictions can be made that hybridization of the products of 
arbitrary primer PGR (from ten individuals) to a 50x50 array of 
arbitrary sequence 9mer probes will identify 20-25 polymorphisms. 



20 



EXAMPLE 3 

PCR primers and amplification conditions 
25 For untargeted ASOF analysis utilizing PCR to generate a 

subset of genomic fragments, primers of arbitrary sequence are used, 
within limits of the following criteria: (i) 55-65% [GfQ content; (ii) 
exclusion of sequences containing strong secondary structure; and 
(iii) exclusion of sequences corresponding to known repeated 
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genomic sequences complementary to the PCR primer disrupt the PCR 
priming ability (or create new priming opportunities) then the final 
hybridization pattern would also be perturbed. Figure 1 is a 
schematic diagram summarizing the method of arbitrary sequence 
oligonucleotide fingerprinting in embodiments that utilize PCR to 
prepare specific collections of fragments. 

EXAMPLE 2 

Theoretical consid erations 

The ultimate implementation of ASOF technology is based 
upon experimentally optimized parameters of PCR primer and 
hybridization probe composition, length and number. However, it is 
useful to design starting conditions and to estimate the throughput of 
arbitrary sequence oligonucleotide fingerprinting marker analysis 
(compared with current technology), based on statistical predictions. 
The number and length of genomic fragments produced during 
arbitrary primer PCR can be experimentally determined and the 
appropriate set of oligonucleotide probes to be included in the 
hybridization array should to be specified. Assuming thai the total 
length of amplified target sequence that is to be hybridized to the set 
of arrayed probes is 50,000 base pairs (100,000 bases), then for a 
probe of length, p, the average number of occurrences, n, of the 
probe within the target sequence is represented by n = 100,000/4 P . 
From this value, one can predict the average number of hybridization 
signals that would be produced with a given composition (number 
and length) of DNA probe array. The following Table summarizes 
these calculations for DNA probes of various length: 
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sequences in genomic DNA such as Alu, SINE and LINE. The arbitrary 
sequence PCR is carried out under the conditions such as those 
described by Caetano-Anolles (9-12) for maximizing the detection of 
polymorphism in the gel electrophoretic analysis of PCR products 
produced with arbitrary sequence oligonucleotides of length 8-10. 
Single primer PCR was conducted with DNA samples prepared from 
two unrelated individuals - designated CF01 and CF02. Each PCR 
reaction contained, in^ 100 (iL volume: 40 pmol primer, 50 ng DNA, 
2.5 U Taq polymerase, 200 each dNTP, and standard PCR buffer. 
The thermocycling program used was: 90°C 1 second; ramp to 23°C a t 
0.2°C/second; hold 23°C 1 second; ramp to 90°C at 0.6°C/second: 
repeat above cycle 34 times; hold at 4°C. 

Figure 2 displays silver-stained (Figure 2 A) and ethidium 
bromide-stained (Figure 2B) gels of the PCR fragments, along with a 
marker lane consisting of products of 0X174 RF DNA cleaved with 
Haelll restriction enzyme. Table II shows the lanes illustrated b y 
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GTGTCGATC 
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IV: TGAGACGAC 
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CF02 
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CGTGTAGTC 
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CF02 


Primer 


VIII: 


CGTGTACAG 
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CF01 
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II: 


GTGTCGATC 


7 


CF01 
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IV: 


TGAGACGAC 
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VII: 


CGTGTAGTC 
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CF01 


Primer 


VIII: 


CGTGTACAG 
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EXAMPLE 4 

PCR prod uct fragmentation and labeling 

The maximal DNA target length that can be captured 
efficiently by support-bound oligonucleotide probes has been shown. 
It was found that PCR fragments of at least 1000 bases can be 
hybridized to 9mer oligonucleotides tethered to a glass surface. If it 
is necessary to fragment PCR products prior to hybridization to the 
genosensor array, sonication to produce random fragments of a few 
hundred base pairs in length is used. To enable quantitation of 
hybridization within the genosensor array using a phosphorimager 
system, PCR products are 5'-end labeled using polynucleotide kinase 
and [y- 32 P]ATP prior to hybridization. If additional detection 
sensitivity is required, target DNA is labeled by incorporation of [a- 
32 P]dNTPs in the PCR reactions. 

E XAM P L E 5 

Oligonucleotide array preparation 

Optimal conditions for preparation of oligonucleotide 
arrays and for carrying out discriminative hybridization have been 
defined. Preferred conditions are summarized as follows. 
Oligonucleotides are synthesized by the "porous wafer" segmented 
approach previously developed. (13 and 14). To enable simple probe 
immobilization on a glass surface it is preferable to synthesize the 
probes using 3'-Amino-Modifier C3 CPG (Glen Research) or the 
equivalent support from CloneTech, which yields terminal .V-amino- 
propanol-derivatized oligonucleotides upon cleavage from the CPG 
support, as illustrated in Figure 3. 

A simple procedure has been devised for attachment of 
S'-amino-propanol-oligonucleotide probes to underivattized glass 
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surfaces. This procedure, which is suitable for the techniques of the 
present invention, involves the following steps: A glass plate is first 
cleaned by sonication in hexane and absolute ethanol for 10 minutes 
each. The slides are then incubated for 2-5 hours at 80°C in a drying 
5 oven. Slides are stored desiccated under vacuum until used for 
probe attachment. Attachment of 3*-aminopropanol-deri vatized 
oligonucleotides to the glass surface is then carried out as follows. A 
Hamilton Microlab 2200 robotic fluid dispensing system is used to 
place 3'-aminopropyl-derivatized oligonucleotides (10 nM solution in 

10 water) in 10-200 nl droplets onto the clean glass surface, at 0.5-2.0 
mm center-to-center spacing. Slides are incubated at room 
temperature for 30 minutes, washed in water, then stored dry at 
room temperature. Quantitation of oligonucleotide attachment 
indicates that within each area of immobilized probe, oligonucleotide 

15 molecules are tethered to the glass with an average spacing of 5 0- 
100 A using this procedure, corresponding to approximately 10 10 -10 M 
probes/mm 2 . 

Figure 4 shows one synthesis scheme for covalent 
attachment. Formation of the ester linkage, rather than amide 
20 linkage, is supported by the finding that the linkage is stable in 
dilute acid (pH 4) but labile in dilute base (pH 10). 

EXAMPLE 6 

Hybridization Fingerprinting 
25 Although some tailoring of hybridization conditions may 

be needed for the arbitrary sequence oligonucleotide fingerprinting 
technique, conditions identical to or very similar to those described 
below will achieve reproducible hybridization patterns. 
Oligonucleotide arrays on glass slides were pre-hybridized with 1 0 
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mM ATP at room temperature for 1 hour, then rinsed with 
hybridization solution consisting of 3.3 M tetrame thy 1 ammonium 
chloride (TMAC), 50 mM Tris-HCl (pH 8.0), 2 mM EDTA, 0.1% SDS and 
10% polyethylene glycol (PEG). 32 P . Iabe ied target DNA was dissolved 
in the hybridization solution at a concentration of 15-30 fmol/ul and 
a minimum of 1,000 cpm/ul, and 20 ul of this solution was applied to 
the area of the slide containing the attached probes. A cover slip was 
applied and the slide was incubated at 90°C for 5 minutes, then 6°C 
for at least 2 hours, then washed with hybridization solution without 
PEG room temperature for 2 hours. 

Representative hybridization fingerprints are shown in 
Figure 5, for three different human DNA samples following PCR using 
a single primer. PCR was carried out using DNA from three different 
unrelated humans, designated CF01, CF02 and UK. The PCR reactions 
contained (in 100 uE) 100 ng template DNA, 200 jiM each dNTP, 0.2 
uM [S'-^PlPrimer I (5'-GTGTCGATG-3') t 5 U Taq polymerase, and 
standard PCR buffer. Prior to addition of template DNA and Taq 
polymerase, tubes were placed under a germicidal UV lamp and 
irradiated for 10 minutes. Tubes were held at 95°C 5 minutes, then 
30 cycles of thermocycling were conducted (90°C 1 minute, 30°C 1 
minute, 72°C 2 minutes), then tubes were brought to 95°C for 5 
minutes and another 2.5 U Taq polymerase was added and 30 more 
cycles of PCR were conducted as above. PCR mixtures were 
centrifuged through Ultrafree-30,000 spin filters (Millipore) to 
remove free primer, then suspended in hybridization buffer and 
hybridized to 9mer arrays on microscope slides, as described in the 
previous paragraph. Two slides were used to obtain hybridization 
fingerprints using the above PCR products. Each slide contained 200 
different 9mers immobilized to the glass as described in Example 5 
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and as arranged as indicated in TABLE III. Table IV shows the 
sequences of the 9mer probes used in the oligonucleotide 
fingerprints of Figure 5. One slide contained 9mers of "box 15/16" 
and the other M box 9/1 0". 
5 Close examination of the hybridization fingerprints of 

Figure 5 resulting from different DNA samples reveals several 
apparent differences at specific locations within the array. Further 
experimentation, which can be readily carried out by one skilled in 
the art, is needed to identify the useful arbitrary sequence 

10 oligonucleotide fingerprinting markers and utilize them in high 
throughput marker analysis, as follows: the experiment is repeated 
using additional DNA samples for each single primer PCR (for 
example, a total of ten DNA samples, each analyzed five times). In 
addition, slides containing additional sets of arbitrary sequence 

15 probes is also used to obtain hybridization fingerprints. After the 
oligonucleotide probes that show reproducible detection of a 
polymorphism (hybridization present in some samples and absent in 
others or displaying reproducible differences in signal intensity) are 
identified for a given PCR primer (i.e., for each collection of PCR 

20 fragments that represents a specific subset of genomic sequences), 
the ASOF marker-specific probes is arrayed onto a slide for 
simultaneous analysis of all such markers detectable using the 
specific PCR reaction. Different sets of ASOF marker probes then is 
used for each PCR condition, to further increase the number of ASOF 

25 markers analyzed simultaneously. 

EXAMPLE 7 
Direct nucleic acid fingerprinting without PCR 

The embodiments of the present invention of ASOF 
described in Examples 1, 2 and 6 employ PCR amplification to 
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generate a subset of DNA sequences that can be fingerprinted by 
hybridization with an array of arbitrary sequence oligonucleotide 
probes. It is also possible to directly acquire hybridization 
fingerprints of DNA or RNA samples, without using DNA amplification 
5 to generate a subset of nucleic acid sequences of reduced genetic 
complexity. One way to achieve direct fingerprinting is to fragment 
the bulk nucleic acid sample (for example, by "sonication, chemical 
cleavage or restriction enzyme digestion), label the fragments (for 
example, by use of polynucleotide kinase), then hybridize the entire 
10 mixture to an array of arbitrary sequence oligonucleotide probes of 
length greater than that used when PCR is used to generate a subset 
of target sequences. In the direct fingerprinting strategy the length 
of probes is chosen such that on average, each probe will hybridize 
with a maximum of one sequence within the entire collection of 
15 target strands present in the sample. 

The appropriate probe length can be determined by trial 
and error, but can also be predicted using the relationship, n = L / 
4 A p, where n represents the average number of occurrences of a 
probe of length p in a target sequence of total length L. For a sample 
20 of human genomic DNA containing six billion bases of sequence, the 
average number of occurrences of a 17-base probe in the entire 
genome is predicted to be 0.35; for a bacterial genome containing ten 
million bases of total sequence, each 12-base oligonucleotide probe of 
arbitrary sequence will occur on average, 0.60 times in the bacterial 
25 genome; and for a population of messenger RNA molecules of total 
length five million bases (example of transcribed sequences in a 
higher eukaryotic cell), the probability that an individual 12-base 
probe will yield a hybridization signal is predicted to be 0.30. 
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Using the appropriate probe length for directly 
fingerprinting a nucleic acid of a given genetic complexity, the 
hybridization fingerprint will preferably include hybridization 
signals at only 1/4 to 2/3 of the hybridization sites. When the genetic 
complexity is high (millions or billions of bases of unique sequence in 
the sample) the hybridization fingerprint may be obtained using long 
hybridization times (hours to days) if the oligonucleotide probe array 
is attached to a flat surface such a a glass slide. The hybridization 
time can be shorted (minutes to hours) if the nucleic acid sample is 
slowly flowed through a channel glass or porous silicon hybridization 
substrate in which oligonucleotide probes are immobilized within 
patches of densely packed, straight, smooth channels, typically of 
diameter 1-10 micrometers, connecting the two faces of a glass or 
silicon wafer, typically 100-500 micrometers thick. 

EXAMPLE 8 

Pirect nucjeic acid fingerprinting using a tandem hybridization 

strategy 

Another strategy for achieving direct nucleic acid 
fingerprinting using an array of arbitrary sequence probes without 
DNA amplification is illustrated in Figure 6. The bulk nucleic acid 
sample, extracted from a biological sample (for example, animal or 
plant tissue, cultured cells or soil sample), is first fragmented (for 
example, by chemical fragmentation, sonication or using restriction 
enzyme digestion) and mixed with a high molar excess of at least one 
oligonucleotide probe of length m that is labeled (for example, with a 
radioactive tag, a fluorescent tag or biotin). The mixture is then 
hybridized with an array of arbitrary sequence "capture probes" of 
length n. Hybridization conditions (temperature, ionic strength or 
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concentration of ^denaturants such as formamide) are chosen such 
that neither the labeled probes, nor the capture probes, will form a 
stable duplex structure with the target strands, but duplexes of 
combined length m+n will be stable. Under these conditions a target 
strand will hybridize to the array only where a capture probe and 
labeled probe hybridize in tandem on the target strand, forming a 
contiguously base-stacked region of length m+n. 

When unbound material is washed away with 
hybridization buffer, a hybridization fingerprint will be produced, 
which can be visualized and quantitated using a phosphorimager 
with 32 P, 33 P or 35 S labels, or using a CCD camera and excitation light 
source with fluorescent tags. The quantitative . hybridization 
fingerprint can be archived in a computer database and compared 
with fingerprints prepared from different samples. Mixtures of 
labeled probes, containing a multiplicity of distinguishable 
fluorescent tags, can be used to produce a "multicolor" hybridization 
fingerprint of greater information content. 

The appropriate length of labeled and capture probes that 
are to be used in the tandem hybridization strategy of direct nucleic 
acid fingerprinting can be determined by trial and error, but can also 
be estimated using the relationship, n = L / 4 A p to predict the average 
occurrence of probes within the entire target sequence, or the 
probability that a probe of length p will hybridize to a target 
sequence of length L. For example, in the case of a human genomic 
DNA sample of six billion bases, a 9-base capture probe is estimated 
to occur about 22,900 times (i.e., each lOmer capture probe is 
predicted to hybridize with about 22,900 different target sequences). 
As explained above, however, stable hybridization will occur only if a 
labeled probe hybridizes in tandem with the capture probe on the 
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target strand. If the tandem probe is also a 9mer, the average 
occurrence of the probe in the 9-base target adjacent to the capture 
probe is estimated as n = 9/4 A 9, or 3.43 x 10 A -5. The combined 
occurrence of the tandem hybridization of capture and labeled probe 
is estimated as the product of the individual occurrences, which in 
the above example is 0.79. Upward adjustment of this estimate 
would be necessary if labeled probes could hybridize in tandem with 
the capture probe on either side of the capture probe, while 
downward adjustment would be needed if one considers that the 
analysis would preferably be targeted to euchromatin (unique 
sequences) within the genome. Nevertheless, it appears that capture 
and labeled probes approximately nine bases in length would be 
appropriate for use in the direct fingerprinting of human genomic 
DNA using the tandem hybridization approach, although actual 
optimal probe length can readily be determined experimentally. 
Using the same statistical approach, the appropriate length of capture 
and labeled probes for direct fingerprinting of a nucleic acid sample 
of ten million bases (for example, a bacterial genome or total 
expressed sequences in a higher eukaryote) is predicted to be about 
seven bases. 

For direct fingerprinting of nucleic acid samples of high 
genetic complexity (for example, mammalian genomes or nucleic 
acids extracted from microbial populations) using the tandem 
hybridization strategy, the flowthrough genosensor configuration 
(utilizing a channel glass or porous silicon hybridization substrate) is 
greatly preferred, for the reasons given in Example 7. Furthermore, 
the nucleic acid strands bound to any given hybridization cell may be 
recovered from the support (for example, by elution with hot water) 
and used for further analysis (cloning, sequencing, PCR, etc.). An 
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important additional feature of the tandem hybridization method of 
the present invention is that the combined sequence of tandem ly 
hybridizing capture and labeled probes (m+n) can be used to define a 
sequence that can be synthesized and used for dideoxy sequencing or 
PCR amplification of the eluted nucleic acid strands. 

exa mple; 9 

Data collection and an*| Y <:jQ 

The hybridization intensities across the DNA probe array 
are measured using a Fuji phosphorimager. This instrument is 10-20 
times more sensitive than standard X-ray film and can collect 
hybridization data across a total area of 20 x 40 cm. The Fuji 
phosphorimager system has resident software capable of 
quantitation of hybridization within the user-defined matrix and can 
store the data in digitized tabular form accessible to spreadsheet 
programs such as Excel. Alternatively, hybridization fingerprints 
can be analyzed by quantitative CCD camera imaging systems, when 
fluorescent or chemiluminescent labeling is used. 

EXAMPLE 10 

A simple, reliable procedure is used to link directly the 
3 % -aminopropanoI-derivatized oligonucleotides to unmodified SiCX 
surfaces. The linkage is (i) stable in hot water, enabling multiple 
cycles of hybridization; (ii) stable in mild acid but labile in mild base 
(favoring the ester linkage over the amide linkage); (iii) not formed 
with S'-hexylamine-derivatized oligonucleotides (primary amine 
alone is insufficient); (iv) inhibited by pretreatment of glass with 
propanolamine but not propylamine; and (v) blocked by acetylation 
of primary amine on oligonucleotide. The attachment reaction 
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proceeds rapidly in aqueous solution at room temperature and gives 
a lower background of nonspecific binding of target DNA to the 
surface, compared with the previous epoxy-amine linkage method. 

The following procedure is carried out for attachment of 
5 oligonucleotides to glass surfaces using the new direct coupling 
chemistry. Oligonucleotides are chemically synthesized using the 3'- 
Amino-Modifier C3 CPG support (Glen Research, Sterling, VA, cat. no. 
20-2950) with the standard phosphoramidite chemistry (21). During 
cleavage of the oligonucleotides from the support the C3 amino group 

10 (actually a propanolamine function) is created at the 3'-end. Custom 
oligonucleotides with this 3*-propanolamine modification are 
available from Genosys Biotechnologies, Inc. (The Woodlands, TX). 
Oligonucleotides are dissolved in water at a concentration of 10-20 
jiM. Glass microscope slides are cleaned by rinsing with acetone and 

15 ethanol, and dried in an 80°C oven. Droplets of oligonucleotide 
solution (typically 50-250 nL) are placed onto the clean, dry slide, 
incubated at room temperature for 5-15 min, then rinsed with water, 
air-dried and stored dessicated at room temperature. The 
attachment reaction occurs rapidly, and if some of the droplets dry 

20 during the application of all oligonucleotides in an array, the slide 
should be held at room temperature until all droplets dry before 
washing with water. (The reaction is apparently complete upon 
drying). If droplets are applied manually, the slide can be placed 
above a printed template to guide the placement of droplets. A 

25 commercially available robotic fluid dispensing system (Hamilton 
MicroLab 2200 system equipped with 21G needles and 50 \iL 
syringes) is capable of robotically dispensing droplets as small as 1 0 
nL onto a glass slide at 1mm center-to-center spacing (Beattie et ah. 
1995a,b). 
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For hybridization of target strands to nonamcr 
oligonucleotides attached to microscope slides the following standard 
procedure (Beattie et al., 1995a,b) can be used. Slides are 
"prehybridized" by soaking for 1 hr at room temp, in a "blocking 
solution" followed by a brief water wash. 10-20 mM 

tripolyphosphate has been found to be an effective and economical 
blocking solution for minimizing the nonspecific binding of 32p. 
labeled target strands to the glass slide (Beattie et al., 1995b). Target 
DNA (typically, PCR product) is dissolved in (or added to) 
hybridization buffer (either 6XSSC or 3.3M tetramethylammonium 
chloride in 50 mM Tris-HCl (pH 8), 2 mM EDTA, 0.1% SDS and 10% 
polyethylene glycol-8000) at a concentration of 10-50 fmol 
strands/uL (10-50nM). If the target strands are labeled with 32p, a 
minimum of 2,000 cpm/uL is used in the hybridization mixture, and 
prior to addition of labeled DNA to the hybridization mixture, 
unincorporated label is removed by loading the DNA onto a 
Microcon-3 microconcentrator (Amicon, Beverly, MA) and washed 
three times with water. Furthermore, if PCR is used to amplify the 
target, the PCR product is processed with a Millipore (Bedford, MA) 
Ultrafree spin-filter (30,000 mol-wt cutoff) to remove excess PCR 
primers prior to hybridization. An aliquot of target DNA in 
hybridization buffer is pipetted onto the microscope slide (20 uL for 
an array occupying 1/3 of the slide or 60 uL for the entire slide) and 
covered with a glass cover slip. The slide is incubated at 6°C for 2 hr 
to overnight, then the slide is washed at room temperature for at 
least 1 hr with hybridization buffer without PEG. For hybridization 
of immobilized probes of different lengths, variation* in the 
hybridization and temperature should be explored to optimize the 
hybridization with respect to signal intensity and mismatch 
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discrimination. Hybridization of 12mer arrays can be conveniently 
carried out at room temperature in the above hybridization buffer. 
If target strands are labeled with 32 P, hybridization can generally be 
quantitated within a few minutes using a phosphorimager (Beattie et 
al. t 1995a,b), although overnight exposure against X-ray film is 
adequate for autoradiographic detection. 

The present invention also provides a method of species, 
strain, subtype or gender identification, comprising the steps of: 
extracting genomic DNA from an organism, tissue or cells; amplifying 
a subset of genomic DNA sequences by a polymerase chain reaction 
using one or more oligonucleotide primers of arbitrary sequence; 
introducing at least one label into said amplified subset of genomic 
DNA; combining said amplified labeled subset of genomic DNA with a 
two-dimensional array of surface-bound oligonucleotide probes 
under hybridizing conditions to form a quantitative hybridization 
fingerprint for said genomic DNA; and identifying the species, strain, 
subtype or gender of the organism, by comparing said hybridization 
fingerprint with a database of hybridization fingerprints previously 
obtained from known species, strains, subtypes or genders. 

The present invention additionally provides a method of 
analyzing and comparing mixed populations of organisms in 
biological or environmental samples, comprising the steps of: 
extracting DNA or RNA from a first biological or environmental 
sample; amplifying a first subset of nucleic acid sequences from said 
DNA or RNA extracted from said first biological or environmental 
sample by a polymerase chain reaction using one or more 
oligonucleotide primers of arbitrary sequence; introducing at least 
one label into said first subset of nucleic acid sequences; combining 
said first labeled, amplified subset of nucleic acid sequences with a 
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two-dimensional array of surface-bound oligonucleotide probes 
under hybridizing conditions to form a first qjantitative 
hybridization fingerprint for said first biological or environmental 
sample; extracting RNA or DNA from a second biological or 
environmental sample; amplifying a second subset of nucleic acid 
sequences from said DNA or RNA extracted from said second 
biological or environmental sample by a polymerase chain reaction 
using one or more oligonucleotide primers of arbitrary sequence; 
introducing at least one label into said second subset of nucleic acid 
sequences; combining said second labeled, amplified subset of nucleic 
acid sequences with said two-dimensional array of surface-bound 
oligonucleotide probes under hybridizing conditions to form a second 
quantitative hybridization fingerprint for said second biological or 
environmental sample; comparing said first quantitative 
hybridization fingerprint to said second quantitative hybridization 
fingerprint; and detecting differences in the population of organisms 
in said different biological or environmental samples, by detecting 
differences between said first quantitative hybridization fingerprint 
and said second quantitative hybridization fingerprint. 

In another embodiment, the present invention provides a 
method of direct genomic fingerprinting of nucleic acids extracted 
from a biological or environmental sample, comprising the steps of: 
mixing genomic DNA or RNA extracted from a biological sample with 
a high molar excess of at least one labeled oligonucleotide probe of 
arbitrary sequence; hybridizing said mixture with an array of 
arbitrary sequence capture probes, using conditions of temperature 
and ionic strength under which neither the labeled probe(s), nor 
capture probes alone will stably hybridize with the DNA target, but 
under which capture and labeled probes, when tandemly hybridized 
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to a target strand to form a longer, contiguously base-stacked 
combined duplex region, will result in stable capture of the target 
strand; and comparing the hybridization fingerprint with genomic 
fingerprints obtained from different biological samples. In one 
5 aspect, the arbitrary sequence oligonucleotide probe array is formed 
on a flat surface. The method may be performed wherein the 
arbitrary sequence oligonucleotide probe array is formed within a 
flowthrough layer of channel glass or porous silicon. Further, a 
wherein a multiplicity of labeled primers may be mixed with the 

10 nucleic acid extracted from a biological or environmental sample. If 
multiplicity of distinguishable labels are used, each may be 
incorporated into a different labeled probe. Preferably, labeled 
probes and said capture probes are 8-10 bases in length. ^ 
The present invention also provides a method of directly 3 

15 analyzing and comparing mixed populations of organisms in ^ 
biological or environmental samples, comprising the steps of: - 
extracting DNA or RNA from a first biological or environmental 
sample; mixing said DNA or RNA extracted from said first biological - 
or environmental sample with a high molar excess of at least one : 

20 labeled oligonucleotide probe of arbitrary sequence; hybridizing said 
mixture derived from said first biological or environmental sample 
with an array of arbitrary sequence capture probes, using conditions 
of temperature and ionic strength under which neither the labeled 
probe(s), nor capture probes alone will stably hybridize with the DNA 

25 target, but under which capture and labeled probes, when tandemly 
hybridized to a target strand to form a longer, contiguously base- 
stacked combined duplex region, will result in stable capture of the 
target strand; obtaining a first quantitative hybridization fingerprint 
corresponding to said first biological or environmental sample; 
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extracting DNA or RNA from a second biological or environmental 
sample; mixing said DNA or RNA extracted from said second 
biological or environmental sample with a high molar excess of at 
least one labeled oligonucleotide probe of arbitrary sequence; 
5 hybridizing said mixture derived from said second biological or 
environmental sample with an array of arbitrary sequence capture 
probes, using conditions of temperature and ionic strength under 
which neither the labeled probe(s), nor capture probes alone will 
stably hybridize with the DNA target, but under which capture and 
10 labeled probes, when tandemly hybridized to a target strand to form 
a longer, contiguously base-stacked combined duplex region, will 
result in stable capture of the target strand; obtaining a second 
quantitative hybridization fingerprint corresponding to said second 
biological or environmental sample; and comparing the quantitative 
15 hybridization fingerprint obtainded from said first biological or 
environmental sample with the quantitative hybridization 
fingerprint obtained from said second biological or environmental 
sample. 

The present invention also provides a method of direct 
20 profiling of gene expression at the level of transcription, comprising 
the steps of: mixing bulk messenger RNA extracted from a biological 
sample with a high molar excess of at least one labeled 
oligonucleotide probe of arbitrary sequence; hybridizing said mixture 
with an array of arbitrary sequence capture probes, using conditions 
25 of temperature and ionic strength under which neither the labeled 
probe(s). nor capture probes alone will stably hybridize with the RNA 
target, but under which capture and labeled probes, when tandemly 
hybridized to a target strand to form a longer, contiguously base- 
stacked combined duplex region, will result in stable capture of the 
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RNA transcript; and comparing said hybridization fingerprint with 
different hybridization fingerprints obtained from different biological 
samples. The arbitrary sequence oligonucleotide probe array may be 
formed on a flat surface. The arbitrary sequence oligonucleotide 
5 probe array can be formed within a flowthrough layer of channel 
glass or porous silicon. In one form multiplicity of labeled primers is 
mixed with the RNA sample. When a multiplicity of distinguishable 
labels are employed, each may be incorporated into a different 
labeled probe. The arbitrary sequence oligonucleotide probe array 
10 may be formed on a flat surface or formed within a flowthrough 
layer of channel glass or porous silicon. 

Preferably, the labeled probes and said capture probes are of length 
6-8 bases. 

Also provided is a method for directly analyzing and 
15 comparing nucleic acid samples of high genetic complexity, 
comprising the steps of: extracting DNA or RNA from a biological 
sample; adding at least one labeled oligonucleotide probe of arbitrary 
sequence to the extracted nucleic acid and hybridizing the mixture 
with an array of arbitrary sequence capture probes, using conditions 
20 of temperature and ionic strength under which neither the labeled 
probe(s), nor capture probes alone will stably hybridize with the 
target strands, but under which capture and labeled probes, when 
tandemly hybridized to a target strand to form a longer, contiguously 
base-stacked combined duplex region, will result in stable capture of 
25 the target strand; comparing the hybridization fingerprint with 
fingerprints obtained from different biological samples; eluting 
bound target strands from any desired hybridization cell in the 
array, by applying a denaturant solution to the desired location in 
the array; and further analyzing said eluted target strands, using the 
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combined sequence of the capture and labeled probes to define a 
longer primer for PCR amplification or dideoxy sequencing. 
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1. A method of detecting sequence polymorphisms between 
samples of genomic DNA, comprising the steps of: 
5 amplifying a first subset of genomic DNA sequences from 

genomic DNA extracted from a first individual by a polymerase chain 
reaction using one or more oligonucleotide primers of arbitrary 
sequence; 

introducing at least one label into said first amplified 
10 subset of genomic DNA; 

combining said first amplified subset of genomic DNA 
with a two-dimensional array of surface-bound oligonucleotide 
probes under hybridizing conditions to form a first quantitative 
hybridization fingerprint for said first subset of genomic DNA 
1 5 sequences; 

amplifying a second subset of genomic DNA sequences 
from genomic DNA extracted from a second individual by a 
polymerase chain reaction using said one or more oligonucleotide 
primers of arbitrary sequence; 
20 introducing at least one label into said second amplified 

subset of genomic DNA; 

combining said second amplified subset of genomic DNA 
with said two-dimensional array of surface-bound oligonucleotide 
probes under hybridizing conditions to form a second quantitative 
25 hybridization fingerprint for said subset of genomic DNA sequences; 

comparing said first quantitative hybridization 
fingerprint to said second quantitative hybridization fingerprint: and 

detecting sequence polymorphisms in said samples of 
genomic DNA by detecting differences between said first quantitative 
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hybridization fingerprint and said second quantitative hybridization 
fingerprint. 



2. The method of claim 1, wherein said one or more 
5 oligonucleotide primers of arbitrary sequence has a length of 8 to 10 

nucleotides. 

3. The method of claim 1, wherein said label is introduced 
by a method selected from the group consisting of incorporating 

10 labeled substrate in the PCR reaction and labeling the PCR fragments. 

4. The method of claim 1, wherein said one or more 
oligonucleotide primers of arbitrary sequence has a G+C content of 



15 



55-65' 



5. The method of claim 1, wherein said one or more 
oligonucleotide primers of arbitrary sequence does noi have a 
secondary structure. 

20 6. The method of claim 1, wherein said one or more 

oligonucleotide primers of arbitrary sequence does not have 

sequences corresponding to Alu, LINE, SINE or other repetitive 
sequence elements. 

25 7; The method of claim 1, wherein the number of different 

oligonucleotide probes arrayed on the surface is at least 100. 

8. The method of claim 1, wherein the number of different 
oligonucleotide probes arrayed on the surface is at least 1000. 
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7. A method of detecting sequence polymorphisms in a 
genomic DNA sample, comprising the steps of: 

amplifying a first subset of genomic DNA sequences from 
genomic DNA extracted from a first individual by a polymerase chain 
5 reaction using a multiplicity of defined sequence oligonucleotide 
primer pairs directed toward a corresponding multiplicity of known 
genomic regions; 

labeling said first amplified subset of genomic DNA; 
combining said first amplified subset of genomic DNA 
10 with a two-dimensional array of surface-bound oligonucleotide 
probes under hybridizing conditions to form a first quantitative 
hybridization fingerprint for said first subset of genomic DNA 
sequences; 

amplifying a second subset of genomic DNA sequences 
15 from genomic DNA extracted from a second individual by a 
polymerase chain reaction using said multiplicity of defined sequence 
oligonucleotide primer pairs directed toward a corresponding 
multiplicity of known genomic regions; 

labeling said second amplified subset of genomic DNA; 
20 combining said second amplified subset of genomic DNA 

with said two-dimensional array of surface-bound oligonucleotide 
probes under hybridizing conditions to form a second quantitative 
hybridization fingerprint for said subset of genomic DNA sequences; 

comparing said first quantitative hybridization 
25 fingerprint to said second quantitative hybridization fingerprint; and 

detecting polymorphisms in said samples of genomic DNA 
by detecting differences between said first quantitative hybridization 
fingerprint and said second quantitative hybridization fingerprint. 
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8. A method for profiling of gene expression at the level of 
transcription, comprising the steps of: 

extracting RNA from a biological sample; 

conducting reverse transcriptase-arbitrary primer PCR to 
amplify subsets of expressed sequences; 

labeling said amplified subsets of expressed sequences; 

hybridizing the labeled, amplified subsets of expressed 
sequences with an array of oligonucleotide probes of arbitrary 
sequence to produce a quantitative hybridization fingerprint; and 

detecting differences in gene expression by comparing 
said quantitative hybridization fingerprint with quantitative 
hybridization fingerprints obtained from a other experiments 
performed previously for other biological samples. 

9. An improved method of preparing oligonucleotide arrays 
for use in hybridization analyses, comprising the steps of: 

chemically synthesizing a desired set of oligonucleotide 
probes using 3-amino-C3 controlled pore glass support material to 
produce completed desired oligonucleotides; 

cleaving said completed desired oligonucleotides from 
said support material in concentrated ammonium hydroxide to yield 
deprotected oligonucleotides bearing aminopropanol groups at their 
3*-termini; 

cleaning a glass or silicon dioxide surface with organic 
solvents and drying at elevated temperature; 

applying a quantity of oligonucleotides bearing 
aminopropanol groups at their 3'-termini in aqueous solution to said 
surface of said clean, dry glass or silicon dioxide; 
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allowing covalent bonding of said oligonucleotides bearing 
aminopropanol groups at their 3'-termini to said surface through 
terminal aminopropanol functions; and 

removing unbound oligonucleotides from the surface by 
5 washing with water. 

10. A method of species, strain, subtype or gender 
identification, comprising the steps of: 

extracting genomic DNA from an organism, tissue or cells; 
10 amplifying a subset of genomic DNA sequences by a polymerase 
chain reaction using one or more oligonucleotide primers of arbitrary 
sequence; 

introducing at least one label into said amplified subset of 
genomic DNA; 

15 combining said amplified labeled subset of genomic DNA with a 

two-dimensional array of surface-bound oligonucleotide probes 
under hybridizing conditions to form a quantitative hybridization 
fingerprint for said genomic DNA; and 

identifying the species, strain, subtype or gender of the 

20 organism, by comparing said hybridization fingerprint with a 
database of hybridization fingerprints previously obtained from 
known species, strains, subtypes or genders. 



11. A method of analyzing and comparing mixed populations 
25 of organisms in biological or environmental samples, comprising the 
steps of: 

extracting DNA or RNA from a first biological or environmental 
sample; amplifying a first subset of nucleic acid sequences from said 
DNA or RNA extracted from said first biological or environmental 
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sample by a polymerase chain reaction using one or more 
oligonucleotide primers of arbitrary sequence; 

introducing at least one label into said first subset of nucleic 
acid sequences; 

5 combining said first labeled, amplified subset of nucleic acid 

sequences with a two-dimensional array of surface-bound 
oligonucleotide probes under hybridizing conditions to form a first 
quantitative hybridization fingerprint for said first biological or 
environmental sample; 

10 extracting RNA or DNA from a second biological or 

environmental sample; amplifying a second subset of nucleic acid 
sequences from said DNA or RNA extracted from sa d second 
biological or environmental sample by a polymerase chain reaction 
using one or more oligonucleotide primers of arbitrary sequence; 

15 introducing at least one label into said second subset of nucleic 

acid sequences; 

combining said second labeled, amplified subset of nucleic acid 
sequences with said two-dimensional array of surface-bound 
oligonucleotide probes under hybridizing conditions to forrr a second 
20 quantitative hybridization fingerprint for said second biological or 
environmental sample; 

comparing said first quantitative hybridization fingerprint to 
said second quantitative hybridization fingerprint; and 

detecting differences in the population of organisms in said 
25 different biological or environmental samples, by detecting 
differences between said first quantitative hybridization fingerprint 
and said second quantitative hybridization fingerprint. 
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12. A method of direct genomic fingerprinting of nucleic acids 
extracted from a biological or environmental sample, comprising the 
steps of: 

mixing genomic DNA or RNA extracted from a biological sample 
5 with a high molar excess of at least one labeled oligonucleotide probe 
of arbitrary sequence; 

hybridizing said mixture with an array of arbitrary sequence 
capture probes, using conditions of temperature and ionic strength 
under which neither the labeled probe(s), nor capture probes alone 
10 will stably hybridize with the DNA target, but under which capture 
and labeled probes, when tandemly hybridized to a target strand to 
form a longer, contiguously base-stacked combined duplex region, 
will result in stable capture of the target strand; and 

comparing the hybridization fingerprint with genomic 
15 fingerprints obtained from different biological samples. 

13. The method of claim 12, wherein the arbitrary sequence 
oligonucleotide probe array is formed on a flat surface. 

20 14. The method of claim 12, wherein the arbitrary sequence 

oligonucleotide probe array is formed within a flowthrough layer of 
channel glass or porous silicon. 

15. The method of claim 12, wherein a multiplicity of labeled 
25 primers is mixed with the nucleic acid extracted from a biological or 
environmental sample. 
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16. The. method of claim 12, wherein a multiplicity of 
distinguishable labels are used, each incorporated into a different 
labeled probe. 

17. The method of claim 12, wherein said labeled probes and 
said capture probes are 8-10 bases in length. 

18. A method of directly analyzing and comparing mixed 
populations of organisms in biological or environmental samples, 
comprising the steps of: 

extracting DNA or RNA from a first biological or environmental 
sample; mixing said DNA or RNA extracted from said first biological 
or environmental sample with a high molar excess of at least one 
labeled oligonucleotide probe of arbitrary sequence; 

hybridizing said mixture derived from said first biological or 
environmental sample with an array of arbitrary sequence capture 
probes, using conditions of temperature and ionic strength under 
which neither the labeled probe(s), nor capture probes alone will 
stably hybridize with the DNA target, but under which capture and 
labeled probes, when tandemly hybridized to a target strand to form 
a longer, contiguously base-stacked combined duplex region, will 
result in stable capture of the target strand; 

obtaining a first quantitative hybridization fingerprint 
corresponding to said first biological or environmental sample; 

extracting DNA or RNA from a second biological or 
environmental sample; mixing said DNA or RNA extracted from said 
second biological or environmental sample with a high molar excess 
of at least one labeled oligonucleotide probe of arbitrary sequence; 
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hybridizing said mixture derived from said second biological or 
environmental sample with an array of arbitrary sequence capture 
probes, using conditions of temperature and ionic strength under 
which neither the labeled probe(s), nor capture probes alone will 
stably hybridize with the DNA target, but under which capture and 
labeled probes, when tandemly hybridized to a target strand to form 
a longer, contiguously base-stacked combined duplex region, will 
result in stable capture of the target strand; 

obtaining a second quantitative hybridization fingerprint 
corresponding to said second biological or environmental sample; and 
comparing the quantitative hybridization fingerprint obtainded from 
said first biological or environmental sample with the quantitative 
hybridization fingerprint obtained from said second biological or 
environmental sample. 

19. A method of direct profiling of gene expression at the 
level of transcription, comprising the steps of: 

mixing bulk messenger RNA extracted from a biological sample 
with a high molar excess of at least one labeled oligonucleotide probe 
of arbitrary sequence; 

hybridizing said mixture with an array of arbitrary sequence 
capture probes, using conditions of temperature and ionic strength 
under which neither the labeled probe(s), nor capture probes alone 
will stably hybridize with the RNA target, but under which capture 
and labeled probes, when tandemly hybridized to a target strand to 
form a longer, contiguously base-stacked combined duplex region, 
will result in stable capture of the RNA transcript; and 

comparing said hybridization fingerprint with different 
hybridization fingerprints obtained from different biological samples. 
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20. A method for directly analyzing and comparing nucleic 
acid samples of high genetic complexity, comprising the steps of: 

extracting DNA or RNA from a biological sample; adding at least 
one labeled oligonucleotide probe of arbitrary sequence to the 
extracted nucleic acid and hybridizing the mixture with an array of 
arbitrary sequence capture probes, using conditions of temperature 
and ionic strength under which neither the labeled probe(s). nor 
capture probes alone will stably hybridize with the target strands, 
but under which capture and labeled probes, when tandemly 
hybridized to a target strand to form a longer, contiguously base- 
stacked combined duplex region, will result in stable capture of the 
target strand; 

comparing the hybridization fingerprint with fingerprints 
obtained from different biological samples; 
15 eluting bound target strands from any desired hybridization 

cell in the array, by applying a denaturant solution to the desired 
location in the array; and 

further analyzing said eluted target strands. using the 
combined sequence of the capture and labeled probes to define a 
20 longer primer for PCR amplification or dideoxy sequencing. 
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arbitrary sequence PCR ~> anonymous fragments 
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directed amplification -> mapped STS fragments 



SPECIFIC COLLECTION OF GENOMIC SEQUENCES 
DISTRIBUTED THROUGHOUT THE GENOME 



hybridization of labeled amplicons to arrays of 
arbitrary sequence oligonucleotide probes 
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HYBRIDIZATION FINGERPRINTS 
REFLECT SAMPLED SEQUENCE 
OF AMPLIFIED GENOMIC REGIONS; 

SEQUENCE POLYMORPHISMS SEEN AS 
DIFFERENCES IN HYBRIDIZATION PATTERN 
PRODUCED FROM DIFFERENT INDIVIDUALS; 

THOSE PROBES THAT DETECT SEQUENCE 
POLYMORPHISMS PLACED ONTO A SINGLE 
GENOSENSOR CHIP FOR SIMULTANEOUS 
ANALYSIS OF NUMEROUS ASOF MARKERS 



BNSOOCID: <WO 9722720A1J_> 



WO 97/22 




PCT/US96/20628 




BNSOOCID: <WO 9722720A1_I_> 




BNSDOCID: <WO 9722720A1_I_> 



^/i^^ PCT/US96/20628 

CH:£MICAL SYu /HESIS OF 3'-AMINOPROPANOL-OL. JONUCLEOTIDE 



wo 97, 



Fmoc-N-H 



3 -Amino-Modifier C3 CPG 



O-Succinyl-lcaa-CPG 
O-DMT 




Multiple Cycles of 
Phosphoramidite Coupling 



Fmoc-N-H 



O-Succinyl-lcaa-CPG 

O-3-Oligonucleotide 




Fi noc-N-H 



H-N-H 




2 hr Rm. Temp, in Cone. Ammonia 
(cleavage from support) 



0-3'-Oligonucleotide 



6 hr 55°C in Cone. Ammonia 
(deprotection) 



O-3-Oligonucleotide 



WO 97/22720 PCT/US96/20628 

5/7 





O 
II 

0-CH-CH 2 -0-P-0- 



3 ' (oligonucleotide probe) 



H 2 C-NH 3 



O 



WO 97/22720 PCT/US96/20628 

7/7 

ARBITRARY SEQUENCE FINGERPRINTING BY TANDEM HYBRIDIZATION 
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